Maximum patterns in datasets

نویسندگان

  • Tibérius O. Bonates
  • Peter L. Hammer
  • Alexander Kogan
چکیده

Given a binary dataset of positive and negative observations, a positive (negative) pattern is a subcube having a nonempty intersection with the positive (negative) subset of the dataset, and an empty intersection with the negative (positive) subset of the dataset. Patterns are the key building blocks in Logical Analysis of Data (LAD), and are an essential tool in identifying the positive or negative nature of “new” observations covered by them. We develop exact and heuristic algorithms for constructing a pattern of maximum coverage which includes a given point. It is shown that the heuristically constructed patterns can achieve more than 98% of the maximum possible coverage, while requiring only a fraction of the computing time of the exact algorithm. Maximum patterns are shown to be useful for constructing highly accurate LAD classification models. In comparisons with the commonly used machine learning algorithms implemented in the publicly available Weka software package, the implementation of LAD using maximum patterns is shown to be a highly competitive classification method. Acknowledgements: T. Bonates gratefully acknowledges the partial support of a DIMACS Graduate Student Award. We acknowledge the assistance provided by Dash Optimization by allowing the use of its integer and linear programming solver Xpress-MP within its Academic Partnership Program.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatio-temporal variability of aerosol characteristics in Iran using remotely sensed datasets

The present study is the first attempt to examine temporal and spatial characteristics of aerosol properties and classify their modes over Iran. The data used in this study include the records of Aerosol Optical Depth (AOD) and Angstrom Exponent (AE) from MODerate Resolution Imaging Spectroradiometer (MODIS) and Aerosol Index (AI) from the Ozone Monitoring Instrument (OMI), obtained from 2005 t...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

MINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS

This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...

متن کامل

Common Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain

Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...

متن کامل

Constraint-Based Mining of Episode Rules and Optimal Window Sizes

Episode rules are patterns that can be extracted from a large event sequence, to suggest to experts possible dependencies among occurrences of event types. The corresponding mining approaches have been designed to find rules under a temporal constraint that specifies the maximum elapsed time between the first and the last event of the occurrences of the patterns (i.e., a window size constraint)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Discrete Applied Mathematics

دوره 156  شماره 

صفحات  -

تاریخ انتشار 2008